AITopics | silver label

Collaborating Authors

silver label

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Performance of weakly-supervised electronic health record-based phenotyping methods in rare-outcome settings

Hong, Yunjing, Nelson, Jennifer C., Williamson, Brian D.

arXiv.org Machine LearningApr-14-2026

Accurately identifying patients with specific medical conditions is a key challenge when using clinical data from electronic health records. Our objective was to comprehensively assess when weakly-supervised prediction methods, which use silver-standard labels (proxy measures of the true outcome) rather than gold-standard true labels, perform well in rare-outcome settings like vaccine safety studies. We compared three methods (PheNorm, MAP, and sureLDA) that combine structured features and features derived from clinical text using natural language processing, through an extensive simulation study with data-generating mechanisms ranging from simple to complex, varying outcome rates, and varying degrees of informative silver labels. We also considered using predicted probabilities to design a chart review validation study. No single method dominated the other across all prediction performance metrics. Probability-guided sampling selected a cohort enriched for patients with more mentions of important concepts in chart notes. SureLDA, the most complex of the three algorithms we considered, often performed well in simulations. Performance depended greatly on selected tuning parameters. Care should be taken when using weakly-supervised prediction methods in rare-outcome settings, particularly if the probabilities will be used in downstream analysis, but these methods can work well when silver labels are strong predictors of true outcomes.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2604.09913

Country:

North America > United States > Oklahoma > Payne County > Cushing (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Alaska (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
(2 more...)

Add feedback

Interpretable phenotyping of Heart Failure patients with Dutch discharge letters

Torri, Vittorio, Boonstra, Machteld J., van de Veerdonk, Marielle C., Kalkman, Deborah N., Uijl, Alicia, Ieva, Francesca, Abu-Hanna, Ameen, Asselbergs, Folkert W., Calixto, Iacer

arXiv.org Artificial IntelligenceJun-2-2025

Objective: Heart failure (HF) patients present with diverse phenotypes affecting treatment and prognosis. This study evaluates models for phenotyping HF patients based on left ventricular ejection fraction (LVEF) classes, using structured and unstructured data, assessing performance and interpretability. Materials and Methods: The study analyzes all HF hospitalizations at both Amsterdam UMC hospitals (AMC and VUmc) from 2015 to 2023 (33,105 hospitalizations, 16,334 patients). Data from AMC were used for model training, and from VUmc for external validation. The dataset was unlabelled and included tabular clinical measurements and discharge letters. Silver labels for LVEF classes were generated by combining diagnosis codes, echocardiography results, and textual mentions. Gold labels were manually annotated for 300 patients for testing. Multiple Transformer-based (black-box) and Aug-Linear (white-box) models were trained and compared with baselines on structured and unstructured data. To evaluate interpretability, two clinicians annotated 20 discharge letters by highlighting information they considered relevant for LVEF classification. These were compared to SHAP and LIME explanations from black-box models and the inherent explanations of Aug-Linear models. Results: BERT-based and Aug-Linear models, using discharge letters alone, achieved the highest classification results (AUC=0.84 for BERT, 0.81 for Aug-Linear on external validation), outperforming baselines. Aug-Linear explanations aligned more closely with clinicians' explanations than post-hoc explanations on black-box models. Conclusions: Discharge letters emerged as the most informative source for phenotyping HF patients. Aug-Linear models matched black-box performance while providing clinician-aligned interpretability, supporting their use in transparent clinical decision-making.

explanation, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.24619

Country:

Europe > Netherlands > North Holland > Amsterdam (0.26)
North America > United States (0.14)
Europe > Italy > Lombardy > Milan (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Reduction of Supervision for Biomedical Knowledge Discovery

Theodoropoulos, Christos, Coman, Andrei Catalin, Henderson, James, Moens, Marie-Francine

arXiv.org Artificial IntelligenceApr-15-2025

Knowledge discovery is hindered by the increasing volume of publications and the scarcity of extensive annotated data. To tackle the challenge of information overload, it is essential to employ automated methods for knowledge extraction and processing. Finding the right balance between the level of supervision and the effectiveness of models poses a significant challenge. While supervised techniques generally result in better performance, they have the major drawback of demanding labeled data. This requirement is labor-intensive and time-consuming and hinders scalability when exploring new domains. In this context, our study addresses the challenge of identifying semantic relationships between biomedical entities (e.g., diseases, proteins) in unstructured text while minimizing dependency on supervision. We introduce a suite of unsupervised algorithms based on dependency trees and attention mechanisms and employ a range of pointwise binary classification methods. Transitioning from weakly supervised to fully unsupervised settings, we assess the methods' ability to learn from data with noisy labels. The evaluation on biomedical benchmark datasets explores the effectiveness of the methods. Our approach tackles a central issue in knowledge discovery: balancing performance with minimal supervision. By gradually decreasing supervision, we assess the robustness of pointwise binary classification techniques in handling noisy labels, revealing their capability to shift from weakly supervised to entirely unsupervised scenarios. Comprehensive benchmarking offers insights into the effectiveness of these techniques, suggesting an encouraging direction toward adaptable knowledge discovery systems, representing progress in creating data-efficient methodologies for extracting useful insights when annotated data is limited.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2504.09582

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)
Asia > Middle East > UAE (0.28)

Genre:

Overview (0.93)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Data Science > Data Mining > Knowledge Discovery (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection

Lee, Beomseok, Gaido, Marco, Calapodescu, Ioan, Besacier, Laurent, Negri, Matteo

arXiv.org Artificial IntelligenceDec-16-2024

As in any data-intensive domain, collecting highquality To fill this gap, this paper explores the use datasets is a fundamental and costly prerequisite of SFMs to automatize the validation of crowdsourced for the development of speech-processing speech data. To this aim, we investigate the applications. Traditional methods heavily rely on employment of off-the-shelf SFMs such as Whisper human workforce, whose costs, as data collection and SeamlessM4T (Radford et al., 2022; Communication scales, are hard to sustain. In the quest for scalable et al., 2023), along with machine translation solutions to tackle this problem, crowdsourcing (MT) models and grapheme-to-phoneme conversion emerged as a viable option that also enables the coverage (G2P). Through experiments on French, of diverse populations (Cefkin et al., 2014; German, and Korean data, we test the integration Poesio et al., 2017). Due to the variable quality of of SFMs and crowdsourcing to reduce validation crowd-sourced data, validation methods that discard costs while preserving final data quality. Our results low-quality contributions are essential to build show that leveraging SFMs yields a cost reduction reliable datasets (Negri et al., 2011; Sabou et al., by over 40%, while maintaining high data quality, 2014; Chittilappilly et al., 2016). This need is exacerbated significantly improving the efficiency and scalability in the collection of speech-text pairs, where of crowd-sourced speech data collection.

artificial intelligence, data quality, social media, (16 more...)

arXiv.org Artificial Intelligence

2412.11978

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Europe > United Kingdom > Scotland (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)

Add feedback

Multilingual Non-Factoid Question Answering with Silver Answers

Mishra, Ritwik, Vennam, Sreeram, Shah, Rajiv Ratn, Kumaraguru, Ponnurangam

arXiv.org Artificial IntelligenceAug-20-2024

Most existing Question Answering Datasets (QuADs) primarily focus on factoid-based short-context Question Answering (QA) in high-resource languages. However, the scope of such datasets for low-resource languages remains limited, with only a few works centered on factoid-based QuADs and none on non-factoid QuADs. Therefore, this work presents MuNfQuAD, a multilingual QuAD with non-factoid questions. It utilizes interrogative sub-headings from BBC news articles as questions and the corresponding paragraphs as silver answers. The dataset comprises over 370K QA pairs across 38 languages, encompassing several low-resource languages, and stands as the largest multilingual QA dataset to date. Based on the manual annotations of 790 QA-pairs from MuNfQuAD (golden set), we observe that 98\% of questions can be answered using their corresponding silver answer. Our fine-tuned Answer Paragraph Selection (APS) model outperforms the baselines. The APS model attained an accuracy of 80\% and 72\%, as well as a macro F1 of 72\% and 66\%, on the MuNfQuAD testset and the golden set, respectively. Furthermore, the APS model effectively generalizes certain a language within the golden set, even after being fine-tuned on silver labels.

aps model, paragraph, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2408.10604

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > India (0.05)
Europe > United Kingdom > Scotland (0.04)
(15 more...)

Genre: Research Report > New Finding (0.46)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Language Model as an Annotator: Unsupervised Context-aware Quality Phrase Generation

Zhang, Zhihao, Zuo, Yuan, Lin, Chenghua, Wu, Junjie

arXiv.org Artificial IntelligenceDec-28-2023

Phrase mining is a fundamental text mining task that aims to identify quality phrases from context. Nevertheless, the scarcity of extensive gold labels datasets, demanding substantial annotation efforts from experts, renders this task exceptionally challenging. Furthermore, the emerging, infrequent, and domain-specific nature of quality phrases presents further challenges in dealing with this task. In this paper, we propose LMPhrase, a novel unsupervised context-aware quality phrase mining framework built upon large pre-trained language models (LMs). Specifically, we first mine quality phrases as silver labels by employing a parameter-free probing technique called Perturbed Masking on the pre-trained language model BERT (coined as Annotator). In contrast to typical statistic-based or distantly-supervised methods, our silver labels, derived from large pre-trained language models, take into account rich contextual information contained in the LMs. As a result, they bring distinct advantages in preserving informativeness, concordance, and completeness of quality phrases. Secondly, training a discriminative span prediction model heavily relies on massive annotated data and is likely to face the risk of overfitting silver labels. Alternatively, we formalize phrase tagging task as the sequence generation problem by directly fine-tuning on the Sequence-to-Sequence pre-trained language model BART with silver labels (coined as Generator). Finally, we merge the quality phrases from both the Annotator and Generator as the final predictions, considering their complementary nature and distinct characteristics. Extensive experiments show that our LMPhrase consistently outperforms all the existing competitors across two different granularity phrase mining tasks, where each task is tested on two different domain datasets.

generator, quality phrase, silver label, (15 more...)

arXiv.org Artificial Intelligence

2312.17349

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > China > Beijing > Beijing (0.04)
(17 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Effective Proxy for Human Labeling: Ensemble Disagreement Scores in Large Language Models for Industrial NLP

Du, Wei, Advani, Laksh, Gambhir, Yashmeet, Perry, Daniel J, Shiralkar, Prashant, Xing, Zhengzheng, Colak, Aaron

arXiv.org Artificial IntelligenceNov-19-2023

More recently, (Fu et al., 2023) natural language processing (NLP) tasks using creates a meta-model responsible for predicting the latest generative pretrained models such as the accuracy of the LLM model using the model's GPT (OpenAI, 2023; Ouyang et al., 2022), PaLM confidence scores as features. Methods from the (Chowdhery et al., 2022), and many others (Touvron computer vision (CV) domain to assess unlabeled et al., 2023; Bai et al., 2022; Penedo et al., data more generally have, for example, proposed 2023; Taori et al., 2023). This new generation of the average threshold confidence method that learns models opens up many new possibilities including a threshold over the model's confidence, predicting competitive performance in zero-shot and few-shot accuracy as the fraction of unlabeled examples settings for tasks that have typically been modeled exceeding that threshold (Garg et al., 2022), or iteratively using a supervised setting (OpenAI, 2023). More learn an ensemble of models to identify established language models (BERT (Devlin et al., misclassified data points and perform self-training 2019), RoBERTa (Liu et al., 2019), XLM-Roberta to improve the ensemble with the identified points (Conneau et al., 2020b), etc.) provide a strong balance (Chen et al., 2021). However, the metrics and hyperparameters of inference cost and task performance for in previous works are specifically for such systems. This broad class of large language classification tasks and cannot be easily extended models (LLMs) used for complex supervised NLP to more complex tasks.

disagreement score, gpt-4, silver label, (16 more...)

arXiv.org Artificial Intelligence

2309.05619

Country: Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

Fine-grained Contrastive Learning for Relation Extraction

Hogan, William, Li, Jiacheng, Shang, Jingbo

arXiv.org Artificial IntelligenceOct-20-2022

Recent relation extraction (RE) works have shown encouraging improvements by conducting contrastive learning on silver labels generated by distant supervision before fine-tuning on gold labels. Existing methods typically assume all these silver labels are accurate and treat them equally; however, distant supervision is inevitably noisy -- some silver labels are more reliable than others. In this paper, we propose fine-grained contrastive learning (FineCL) for RE, which leverages fine-grained information about which silver labels are and are not noisy to improve the quality of learned relationship representations for RE. We first assess the quality of silver labels via a simple and automatic approach we call "learning order denoising," where we train a language model to learn these relations and record the order of learned training instances. We show that learning order largely corresponds to label accuracy -- early-learned silver labels have, on average, more accurate labels than later-learned silver labels. Then, during pre-training, we increase the weights of accurate labels within a novel contrastive learning objective. Experiments on several RE benchmarks show that FineCL makes consistent and significant performance gains over state-of-the-art methods.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2205.12491

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.90)

Add feedback